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Abstract: In this paper, the main features of 
parametrical words within a sentiment lexicon are 
determined. The data for the research are client 
reviews in the Russian language taken from the 
bank client rating; the domain under study is bank 
service quality. The sentiment lexicon structure is 
presented; it includes two primary classes (positive 
and negative words) and three secondary classes 
(increments, polarity modifiers, and polarity anti- 
modifiers). This lexicon is used as the main tool for 
the sentiment analysis carried out by two methods: 
the Naive Bayes classifier and the REGEX algo- 
rithm. 

Parametrical words are referred to as the 
words denoting the value of some domain-specific 
parameter, e.g. the client’s time consuming. To 
distinguish the main features of parametrical words, 
the parameters relevant for the bank service quality 
domain are determined. The revised lexicon struc- 
ture is proposed, with a new class (decrements) 
added. The results of the research demonstrate that 
parametrical words express implicit opinions, since 
parameters are not usually named directly in re- 
views. Only a small number of parametrical words 
can be ranged into the primary classes (positive or 
negative), but this ranging is domain-specific. It is 
the parameter that determines the domain specifici- 
ty of such words. Most parametrical words are 
ranged into the secondary classes, and this ranging 
can be considered universal. The parametrical 
words denoting the increase of a parameter should 
be ranged into the increment class, as they intensify 
positive or negative emotions. The parametrical 
words denoting the decrease of a parameter should 
be ranged into the decrement class, as they reduce 
positive or negative emotions. The evident progress 
on the way to the sentiment lexicon universalization 
can be achieved by classifying parametrical words 
within the sentiment lexicon. 

Key words: cognitive linguistics, natural 
language processing, sentiment analysis, lexicon, 
domain, parametrical words, increment, decrement. 


1. INTRODUCTION 


Sentiment analysis is one of the rap- 
idly developing methods of natural lan- 
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guage processing. The first works were 
published in early 2000s (Nasukawa & Yi, 
2003; Pang et al., 2002; Turney, 2002; 
Wiebe et al., 2001), and since then much 
has been done in this field. Sentiment lexi- 
cons have been built; algorithms have been 
developed (Gamon et al., 2005; Hu & Liu, 
2004; Liu, 2010; Manning et al., 2008; 
Pang & Lee, 2008). All these successful 
studies were focused on the English lan- 
guage, and it seemed logical to apply their 
results to other natural languages, translat- 
ing the lexicons and modifying the tools 
for syntactic analysis. However, the at- 
tempts to build a universal sentiment lexi- 
con, the principal sentiment analysis tool, 
failed. 

A sentiment lexicon is a set of words 
which are used to express opinions and 
emotions in sentiment documents (reviews, 
etc.), it is generally divided into two clas- 
ses: the positive and negative ones (Pang et 
al., 2002). After numerous experiments, it 
is evident that such a lexicon should be 
both language-specific, and domain- 
specific. 

The problem of language specificity 
concerns the differences in the morpholog- 
ical structure of natural languages, while 
the problem of domain-specificity is a se- 
mantic one. Some words from sentiment 
lexicons appear domain-specific (Ga- 
napathibhotla & Liu, 2008: 242), e.g. the 
word Jong can be ranged into the positive 
lexicon when evaluating the battery opera- 
tion (the smartphone domain), but it can be 
ranged into the negative lexicon in evaluat- 
ing the client’s time consuming (the bank 
service quality domain). In this paper, such 
ambiguous words as long are called para- 
metrical words. 
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Parametrical words are referred to as 
the words denoting the amount of some 
domain-specific parameter (battery life, the 
client’s time consuming, etc.). 

The purpose of this paper is to de- 
termine the main features of parametrical 
words within a sentiment lexicon. 


2. MATERIALS AND METHODS 


The data for the research are the cli- 
ent reviews in the Russian language on 
bank service quality from the bank client 
rating taken from (www.bankiru). The 
domain under study is bank service quality. 
To build a sentiment lexicon, 20 reviews 
(10 positive and 10 negative ones) were 
randomly selected. From this content, the 
seed, i.e. basic, lexicon containing 100 
words was constructed manually. Then the 
seed lexicon was extended up to about 500 
words, using synonyms, antonyms, and the 
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sentiment consistency technique (Liu, 
2010). This technique first proposed in 
(Hatzivassiloglou & K. McKeown, 1997) 
uses a list of seed opinion adjective words 
and a set of linguistic constraints (and, but, 
either-or, neither-nor) to identify other 
opinion words and their polarity. For in- 
stance, in the sentence This i-phone is 
beautiful and easy to use, if beautiful is 
known to be positive, it can be inferred that 
easy is also positive. On the contrary, in 
the sentence This i-phone is beautiful, but 
expensive, if beautiful is known to be posi- 
tive, it can be inferred that expensive is 
negative. The seed words with the linguis- 
tic constraints were entered to the Google 
search engine with the search limitation 
within (www.banki.ru). 

All the words in the lexicon were 
stemmed for easier processing. 

The structure of the lexicon is pre- 
sented in Table 1. 


Table 1. The structure of the sentiment lexicon (bank service quality) 
Lexicon classes 


Primary classes 


Secondary classes 








i : Polarit 
Positive Negative Increments Fol Modt: per 
fiers ies 
Modifiers 
be3o0nacuHbiii (safe), ArpeccuBHBbiit (ag- OueHb He (no), HeT Tak (so), 
OecaTHBI (free), gressive), (very), (not), 6e3 Tako 
BeXKJIMBbIH (polite), Oe3BbIXOJHBIM coBepiieHHO (without) ... (such) ... 
KOMIe€TCHTHBIM (compe- (hopeless), rpyOsii (absolutely), 
tent), deTKuH (clear), (rude), ocaqHBIi HuKorya 
3pdextuBusii (efficient) (annoying), (never)* ... 


oougHbIi (offen- 
Sive), TPYAHBI 
(difficult) ... 





* In English lexicons, such words as 
never, nobody, etc. should be ranged into 
the polarity modifiers. In Russian lexicons, 
however, due to the occurrence of double 
negation in the Russian syntax, such words 
are not polarity modifiers, but increments. 

As Table 1 demonstrates, the senti- 
ment lexicon includes two primary classes: 
positive and negative words denoting posi- 


tive and negative opinions, respectively. 
Besides, it includes three secondary clas- 
ses: increments, polarity modifiers, and 
polarity anti-modifiers. 

Increments are referred to as the 
words intensifying the polarity of the other 
words within a sentence without changing 
it into the opposite one, e.g. in the contexts 
Imo ouenb HadexcHoii Oank. (This is a 
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very reliable bank) and Imo ouend nioxue 
ycnogua Kpeduma (These are very poor 
credit terms), the word ovenb (very) is an 
increment which intensifies the positive 
and negative opinions, respectively. 

Polarity modifiers are referred to as 
the words which change the polarity of the 
other words within a sentence into the op- 
posite one, e.g. in the context Camu 
pabomuuku banka He epybvie u He 3,/1b1e 
(The bank operators themselves are not 
rude and aggressive) the positive opinion is 
expressed, though the context includes 
negative words 2py6oie (rude), 31v1e (ag- 
gressive); the word ve (not) is a polarity 
modifier which changes their polarity into 
the positive one. 

Polarity anti-modifiers are referred 
to as the words which cancel the change in 
the polarity in spite of the occurrence of 
polarity modifiers within a sentence. Com- 
pare two contexts: 1) Mena nuxozda He 
oOmauoieanu (I have never been cheated) 
2) Mena nuxozda mak He o6manbvieaau (1 
have never been cheated in such a way). In 
spite of almost complete similarity of the 
words, these contexts express opposite 
opinions: the positive one and the negative 
one, respectively. The difference is that in 
the first context, the word nuxozda (never) 
implies never in this bank, and in the sec- 
ond one it implies never except this bank. 
The word max (such) is a polarity anti- 
modifier, which cancels the change in the 
polarity in the second example, and it re- 
mains negative, as the context contains the 
negative word o6manoieau (cheated). 

To carry out the sentiment analysis, 
the REGEX algorithm was developed. The 
algorithm included 11 formal grammar 
rules and the corresponding syntactic mod- 
els, being a sort of regular expressions 
which detect certain text elements, simpli- 
fy each sentence, and present the text as a 
formal model. One of these rules is pre- 
sented below. 

Rule 1. If between the beginning of 
the sentence, or a punctuation mark, or a 
conjunction (and/or) and the next punctua- 
tion mark, or a conjunction (and/or), or the 
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end of the sentence, there is a polarity 
modifier, then the polarity of all the words 
referred to the sentiment lexicon within 
this segment is changed into the opposite 
one. The sequence of the elements (a po- 
larity modifier, a positive/negative word, 
any other word) does not matter. 

When formalized, the rule can be 
presented as below: 

<S>l<Z>l& {ALT, *, Any POS} 
<Z>|&|</S>|<!/S>|<2/S>|<?!/S> > 

> <S>|<Z>* Any NEG 
*<Z>|&|</S>|<!/S>|<?/S>|<?!/S> > 

— nNEG > -n 

where <S> is the beginning of the 
sentence; 

| is the divisor of equally allowable 
elements, 

<Z> is a punctuation mark; 

& is the and/or conjunction; 

ALT is a polarity modifier; 

POS is a word from the positive lexi- 
con, 

NEG is a word from the negative 
lexicon; 

* is any other word; 

{A, B, C} is the group of the ele- 
ments which can follow in any sequence; 

Any is any number of elements; 

</S>, <V/S>, <2/S>, <?!/S> are the 
ends of the sentence with a full point, an 
exclamation mark, an interrogation mark, 
or both, respectively; 

The REGEX algorithm included suc- 
cessive application of the substitution rules 
according to the priorities obtained from 
the experiments with the documents from 
the training set. For example, the applica- 
tion of Rule 1 resulted in the following 
layout conversion: 

IIiameocu npoxooam oueHb 
OvIcmpo, OeHbeu He 3a6eucaiom. (The pay- 
ments are processed quickly, money 
doesn’t become hung) 

<S>*POS <Z> * ALT NEG </S>—> 
<S>*POS <Z> * POS </S> — 

— 2POS — +2 

At a certain step of the algorithm, the 
number of the POS and NEG wildcards 
was calculated in each sentence, then the 
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draft sentence polarity was calculated (+2 
in the example above). The group of the 
rules to correct the draft polarity was also 
applied. The output of the REGEX algo- 
rithm was the calculation of the document 
polarity normalized to the number of the 
words in the document. 

To carry out the automatic sentiment 
analysis of the reviews, the SENTIMEN- 
TO system was implemented as an Internet 
application with an interface for the model 
testing and its adjustment (Brunova, E.G., 
Bidulya Yu.V. (2014) Algorithm with 
Formal Grammar Elements for Sentiment 
Analysis. Tyumen State University Herald. 
No. 1, in press). Fig. 1 demonstrates the 
window of the sentiment analysis module 
with the conclusion of the system. 





Fig.1. The SENTIMENTO software. The sentiment 
analysis module. 


The system provides the opportunity 
for its users to confirm or reject the system 
conclusion, for this purpose, Your conclu- 
sion request is displayed with two buttons 
(Positive and Negative). The interface for 
entering human conclusions is presented in 
Fig.2. After the user presses a button, the 
system checks if the human conclusion 
matches the system one. In case it does, the 
document is included into the database. 
Besides, these results are used to calculate 
the system efficiency. 
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Fig.2. The interface for entering human conclusions 


The efficiency of the proposed algo- 
rithm was evaluated in comparison with 
the efficiency of the Naive Bayes Classifi- 
er (Webb et al., 2005). 

The sentiment analysis experiments 
with the SENTIMENTO software revealed 
a number of problems, in particular, con- 
cerning parametrical words. For instance, a 
user evaluated the context [7pednazatom 
ManeHbKuu npoyeum no exnady (A small 
deposit interest was offered) as negative, 
while the system evaluated it as neutral, 
since it did not detect any negative words 
in it. As for the context Ouepedo boa 
coecem masjenokaa (The queue was quite 
small), the human conclusion was positive, 
while the system conclusion was negative, 
as it detected a negative word ouvepedo 
(queue). 

Thus, the behavior of parametrical 
words in reviews differs from that of nega- 
tive or positive words, and ignoring this 
fact leads to incorrect analysis results. 


3. RESULTS AND DISCUSSION 


Researchers notice that some words, 
e.g. OueHb (very), coeepwenno (absolute- 
ly), dozzo (long), mednenno (slowly), 
demonstrate their ambiguous nature in the 
process of sentiment analysis. N. 
Lukashevich and I. Chetverkin distinguish 
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operators affecting the semantic polarity, 
however, their operators include rather ne- 
gation words (He (not), Hem (no)) or adjec- 
tive increments (oven (very), cCamoiu 
(most, least)), than adjectives themselves 
(Lukashevich & Chetverkin, 2011: 77). 
Nevertheless, the adjectives, adverbs, and 
even nouns (e.g. makcumym maximum) 
expressing the amount of a parameter 
could be included into the sentiment lexi- 
con. Such words express the intensity of 
the domain attribute, or parameter, e.g. the 
client’s time saving. 

Depending on the parameter, a posi- 
tive or negative opinion can be expressed, 
while it increases or decreases. For in- 
stance, the word high spoken or written 
about the speed of service (the parameter is 
the client’s time saving) is evaluated as 
positive, but the word high spoken or writ- 
ten about the price or credit interest (the 
parameter is the client’s money costs) is 
evaluated as negative. It is the parameter 
that determines the domain specificity of 
such lexicon units. 

To determine the main features of 
parametrical words, the contexts contain- 
ing the words meaning large, small, long, 
short, maximum, minimum, etc. were ex- 
tracted from the corpus of the 70 client re- 
views randomly selected from 
(www.banki.ru). The study of these con- 
texts enabled the domain-specific parame- 
ters to be determined. 

Consider the parameters relevant for 
the bank service quality domain, below a 
context per each parameter is cited, the 
parametrical words are underlined, the 
translation into English is given in brack- 
ets: 

Positive opinions 

1) Increase in the parameter 

a) The client’s positive emotions: 
xouemcA omMemumMb ONnepamuseHocmb 6 
pabome u 2omoéeHocmb oKa3amb MaKcu- 
MyM nomowu Oarce nomeHyuUd-bHbIM 
xiuenmam (I'd like to emphasize the speed 
of operation and the readiness to offer the 
maximum of help even to potential clients) 
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b) The client’s cost saving: Kapma c 
Hemajbim suMmumom (The card with a con- 
siderable limit) 

c) The client’s time saving: Haw 
Kpedum ododpulu o4enb Ovicmpo (Our 
credit was approved very fast) 

d) The sufficiency of service infor- 
mation: Muozo unqdopmayuu, Aucmo6xu, 
naakamol c peknamou (There is a lot of in- 
formation, there are advertising leaflets 
and posters) 

2. Decrease in the parameter 

a) The client’s negative emotions: 
HeOObwou CnucoK 3ameyaHuu (short list 
of remarks) 

b) The client’s money costs: 
MajleHbKuu npoyeHm no Kpedumy (low 
credit interest) 

c) The client’s time consuming: 
Ouepedb Ova coecem majenokaa (The 
queue was quite small) 

Negative opinions 

1) Increase in the parameter 

a) The client’s negative emotions: 
xumpocmu OA bolbwWozo obmana (tricks 
for a great fraud) 

b) The client’s money costs: 4 u max 
niauy HeEMAIbIu npoyenm 3a nojb30eanue 
xpedumom (Anyway, I pay a considerable 
credit interest) 

c) The client’s time consuming: bank 
Ou MeX, Y KO2O MHOZO MUWHEe2O BpemeHu 
(The bank is for those who have much 
Spare time) 

2. Decrease in the parameter 

a) The client’s positive emotions: 
mouky Mao (little use.) 

b) The client’s cost saving: /Iumum 
Masenokuu (The credit limit is small.) 

c) The client’s time — saving: 
niamexcu npoxodam mednenno (The pay- 
ments are processed slowly) 

d) The sufficiency of service infor- 
mation: uxq@opmayuu maao (there is little 
information) 

The extracted parameters are summa- 
rized in the diagram (Fig. 3). 
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Positive epialon Negative opinion 


Client's postive emotions 


{ Clienr's cost aaving 


Sufficiency of service information 


Fig. 3. The parameters of the sentiment analysis for 
bank service quality extracted from the review con- 
texts 


As it can be seen from the parame- 
ters determined (Fig. 3), the increase of a 
certain parameter results in a negative or a 
positive opinion, and the decrease of the 
same parameter results in the opposite 
opinion. For instance, the increase in the 
client’s time saving evokes positive emo- 
tions and results in positive opinions, its 
decrease results in negative emotions and 
negative opinions. On the other hand, the 
increase in the client’s money costs results 
in negative emotions and negative opin- 
ions, its decrease evokes positive emotions 
and results in positive opinions. Thus, par- 
ametrical words are not only domain- 
specific, but they demonstrate their ambig- 
uous nature even within a single domain. 
This is confirmed by their occurrence with- 
in the same, mainly negative, context, cf. 
Muoezo cioeé, Ho mano Oena (There are 
many words, but little work) faim 
6vicmpo, omdaiom doszo (They give 
quickly, but return slowly) Sozowou 
MUHYyC U ManeHoKuu noc (A large minus 
and a small plus). 

The results of the analysis demon- 
strate that ignoring parametrical words in 
sentiment analysis results in incorrect con- 
clusions, so they should be included into 
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the sentiment lexicon and ranged into one 
of its classes. 

Only a small number of parametrical 
words can be ranged into the primary clas- 
ses, e.g. the word 6vicmpo (fast) is ranged 
into the positive class, and the words do.zo 
(long) and medzexnno (slowly) are ranged 
into the negative one; this ranging is defi- 
nitely domain-specific. 

The parametrical words denoting the 
increase of a parameter (meaning large, 
many, much, maximum, etc.) should be 
ranged in the increment class along with 
the words meaning very, absolutely, etc., 
as they intensify positive or negative emo- 
tions. As it was mentioned above, incre- 
ments are the words intensifying the po- 
larity of the other words within a sentence 
without changing it into the opposite one. 
The parametrical words denoting the de- 
crease of a parameter (meaning small, lit- 
tle, few, minimum, etc.) should be ranged 
in a new class which may be referred to as 
the decrement class. Decrements are the 
words decreasing the polarity of the other 
words within a sentence without changing 
it into the opposite one. Thus, most para- 
metrical words are ranged into the second- 
ary classes; this means that they do not ex- 
press the direct opinion, but affect the in- 
tensity of the opinion expressed by other 
words. 

The revised structure of the senti- 
ment lexicon is presented in Table 2, the 
parametrical words are underlined. 
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Table 2. The revised structure of the sentiment lexicon (bank service quality) 





Lexicon classes 





Primary classes 


Secondary classes 











wh ‘ Polarity oan ty 
Positive Negative Increments Decrements ive Anti- 
Modifiers a 
Modifiers 
Be3onacHblit ArpeccuBHbIit Ouensb (very), Mano (little, He (no), Tak (so), 
(safe), (aggressive), coBeplieHHO ~—a few), HeT (not), Takoi 
OeciiaTHBIi Oe3BbIXOTHBI = (absolutely), § MHHHMYyM 6e3 (with- (such)... 
(free), (hopeless), HMKorya (minimum), out)... 
B@KJIMBbIN (po- rpyObiii (rude), (never), MaJICHBKU 
lite), OcayHbIM (an- Hurye (no- (small), 
KOMII€TCHTHBIM —s noying), where), HY3KH 
(competent), oougHbIi (of- MHoro (much, (low 
yeTKuH (clear), fensive), man 
3:bPeKTHBHBIM TpyaHpIi (dif- = MakCHMyM 
(efficient), ficult), no0nro (maximum), 
Opictpo (fast)... (long), OoubIIOn 
MeJIJICHHO large 
(slowly)... BbICOKHH 
(high) ... 





4. CONCLUSION 


The general features of parametrical 
words within the sentiment lexicon are de- 
termined. The structure of the sentiment 
lexicon is revised; a new class (decre- 
ments) is added. 

The results of this research demon- 
strate that the behavior of most paramet- 
rical words in reviews differs from that of 
negative or positive words, and ignoring 
this fact results in incorrect sentiment 
analysis results. Parametrical words gener- 
ally express the implicit opinion: they do 
not express the opinion directly, but affect 
the intensity of the opinion expressed by 
other words. Besides, the parameters them- 
selves are not usually named directly in 
reviews. 

Parametrical words should be includ- 
ed into the sentiment lexicon as follows: 

1) A small number of parametrical 
words can be ranged into the primary clas- 
ses (positive or negative), but this ranging 
is domain-specific. It is the parameter that 


determines the domain specificity of such 
words. 

2) Most parametrical words are 
ranged into the secondary classes (incre- 
ments or decrements), and this ranging can 
be considered universal. 

Thus, the evident progress on the 
way to the sentiment lexicon universaliza- 
tion can be achieved by classifying para- 
metrical words within the sentiment lexi- 
con. 
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